50
Algorithms for Binary Neural Networks
FIGURE 3.11
The curves of elements in M-Filter 1 (M ′
1), M-Filter 2 (M ′
2), M-Filter 3 (M ′
3), and M-Filter
4 (M ′
4) (in Fig. 3.2(a) and Eq. 3.12) on the CIFAR experiment in the training process. The
values of the nine elements in each M-Filter are learned similarly to their averages (dotted
lines). This validates that the special MCNs-1 with a single average element in each M ′
j
matrix is reasonable and compact without large performance loss.
reconstructing full-precision convolutional filters from binarized filters, limiting their use in
computationally limited environments. It has been theoretically and quantitatively demon-
strated that simplifying the convolution procedure via binarized kernels and approximating
the original unbinarized kernels is a very promising solution for DCNNs’ compression.
Although prior BNNs significantly reduce storage requirements, they also generally have
significant accuracy degradation compared to those using full-precision kernels and activa-
tions. This is mainly because CNN binarization could be solved by considering discrete
optimization in the backpropagation (BP) process. Discrete optimization methods can of-
ten guarantee the quality of the solutions they find and lead to much better performance in
practice [66, 119, 127]. Second, the loss caused by the binarization of CNNs has not been
well studied.
We propose a new discrete backpropagation via projection (DBPP) algorithm to effi-
ciently build our projection convolutional neural networks (PCNNs) [77] and obtain highly
accurate yet robust BNNs. Theoretically, we achieve a projection loss by taking advantage
of our DBPP algorithms’ ability to perform discrete optimization on model compression.
The advantages of the projection loss also lie in that it can be jointly learned with the
conventional cross-entropy loss in the same pipeline as backpropagation. The two losses
are simultaneously optimized in continuous and discrete spaces, optimally combined by the
projection approach in a theoretical framework. They can enrich the diversity and thus
improve modeling capacity. As shown in Fig.3.12, we develop a generic projection convolu-
tion layer that can be used in existing convolutional networks. Both the quantized kernels
and the projection are jointly optimized in an end-to-end manner. Our project matrices are
optimized but not for reference, resulting in a compact and efficient learning architecture.
As a general framework, other loss functions (e.g., center loss) can also be used to further
improve the performance of our PCNNs based on a progressive optimization method.
Discrete optimization is one of the hot topics in mathematics and is widely used to solve
computer vision problems [119, 127]. Conventionally, the discrete optimization problem is
solved by searching for an optimal set of discrete values concerning minimizing a loss func-
tion. This chapter proposes a new discrete backpropagation algorithm that uses a projection
function to binarize or quantize the input variables in a unified framework. Due to the flex-